Ovaj projekat predstavlja detaljnu analizu ekoloških parametara prikupljenih sa različitih lokacija. Cilj analize je da se razumeju ključni ekološki faktori koji utiču na biodiverzitet mahovina u ekosistemima. Kroz ovaj projekat, biće primenjene metode deskriptivne statistike, vizualizacije podataka, kao i napredne statističke analize poput Generalizovanih Linearnih Modela (GLM) i Kanonske Korelacione Analize (CCA).
Dokument je organizovan tako da pruži jasan i logičan pregled svih faza projekta, od pripreme podataka do analize i zaključaka.
if (!require("readxl")) install.packages("readxl")
## Loading required package: readxl
if (!require("ggplot2")) install.packages("ggplot2")
## Loading required package: ggplot2
if (!require("dplyr")) install.packages("dplyr")
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
if (!require("tidyr")) install.packages("tidyr")
## Loading required package: tidyr
if (!require("broom")) install.packages("broom")
## Loading required package: broom
# Učitavanje potrebnih biblioteka
library(readxl)
library(dplyr)
library(ggplot2)
library(tidyr)
library(broom)
library(igraph)
##
## Attaching package: 'igraph'
## The following object is masked from 'package:tidyr':
##
## crossing
## The following objects are masked from 'package:dplyr':
##
## as_data_frame, groups, union
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
# Učitavanje Excel fajla
data <- read_excel("Env.xlsx")
## New names:
## • `` -> `...1`
# Prikaz strukture podataka
str(data)
## tibble [35 × 10] (S3: tbl_df/tbl/data.frame)
## $ ...1: chr [1:35] "ST1" "ST2" "ST3" "ST4" ...
## $ pH : num [1:35] 5.1 4.8 4.9 5 4.9 4.5 4.2 4.3 4.5 4.6 ...
## $ sm : num [1:35] 31.2 26.3 25.5 23.4 21.4 ...
## $ st : num [1:35] 18 19 19 18 18 17 18 18 19 17 ...
## $ hc : num [1:35] 2 3 7 4 4 15 12 10 1 12 ...
## $ lc : num [1:35] 4 7 8 9 9 8 7 12 10 11 ...
## $ sd : num [1:35] 1 4 7 9 10 0 0 0 0 0 ...
## $ tn : num [1:35] 8 12 14 16 15 9 11 14 19 11 ...
## $ bn : num [1:35] 4 1 5 3 2 4 8 9 2 1 ...
## $ ft : num [1:35] 1 1 1 1 1 1 1 1 1 1 ...
# Prikaz prvih nekoliko redova podataka
head(data)
## # A tibble: 6 × 10
## ...1 pH sm st hc lc sd tn bn ft
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ST1 5.1 31.2 18 2 4 1 8 4 1
## 2 ST2 4.8 26.3 19 3 7 4 12 1 1
## 3 ST3 4.9 25.5 19 7 8 7 14 5 1
## 4 ST4 5 23.4 18 4 9 9 16 3 1
## 5 ST5 4.9 21.4 18 4 9 10 15 2 1
## 6 PD1 4.5 20.8 17 15 8 0 9 4 1
Deskriptivna statistika sažima i prikazuje skup podataka, pružajući uvid u centralnu tendenciju i rasejanje. Ove statističke tehnike sažimaju i opisuju podatke, pružajući korisne uvide. Pored minimalnih i maksimalnih vrednosti, u obzir smo uzimali i:
Srednju vrednost (Prosek): Aritmetička sredina skupa vrednosti. Predstavlja centralnu vrednost oko koje se podaci grupišu.
Standardnu devijaciju: Mera rasprostranjenosti podataka oko srednje vrednosti.
summary(data)
## ...1 pH sm st
## Length:35 Min. :4.100 Min. :10.42 Min. :15.0
## Class :character 1st Qu.:4.200 1st Qu.:13.20 1st Qu.:17.0
## Mode :character Median :4.500 Median :21.20 Median :18.0
## Mean :4.694 Mean :20.63 Mean :17.4
## 3rd Qu.:4.850 3rd Qu.:25.93 3rd Qu.:18.0
## Max. :6.600 Max. :40.00 Max. :19.0
## hc lc sd tn bn
## Min. : 1.000 Min. : 2.0 Min. : 0.0 Min. : 8.00 Min. :0.000
## 1st Qu.: 4.000 1st Qu.: 8.0 1st Qu.: 0.0 1st Qu.:13.50 1st Qu.:1.000
## Median : 7.000 Median :14.0 Median : 0.0 Median :19.00 Median :2.000
## Mean : 7.286 Mean :16.8 Mean : 1.8 Mean :18.34 Mean :2.629
## 3rd Qu.: 8.500 3rd Qu.:26.5 3rd Qu.: 2.5 3rd Qu.:23.50 3rd Qu.:4.000
## Max. :21.000 Max. :36.0 Max. :10.0 Max. :29.00 Max. :9.000
## ft
## Min. :1.000
## 1st Qu.:1.000
## Median :1.000
## Mean :1.429
## 3rd Qu.:2.000
## Max. :2.000
numeric_cols <- sapply(data, is.numeric)
numeric_data <- data[, numeric_cols]
summary_stats <- apply(numeric_data, 2, function(x) c(min(x), max(x), mean(x), sd(x)))
summary_stats
## pH sm st hc lc sd tn
## [1,] 4.1000000 10.420000 15.000000 1.000000 2.00000 0.000000 8.000000
## [2,] 6.6000000 40.000000 19.000000 21.000000 36.00000 10.000000 29.000000
## [3,] 4.6942857 20.627429 17.400000 7.285714 16.80000 1.800000 18.342857
## [4,] 0.7235301 8.035562 1.005865 4.409691 10.71777 3.007833 6.082486
## bn ft
## [1,] 0.000000 1.0000000
## [2,] 9.000000 2.0000000
## [3,] 2.628571 1.4285714
## [4,] 2.533142 0.5020964
numeric_data <- data %>% select(where(is.numeric))
summary_stats <- numeric_data %>%
summarise_all(list(
mean = ~mean(. , na.rm = TRUE),
max = ~max(. , na.rm = TRUE),
min = ~min(. , na.rm = TRUE),
sd = ~sd(. , na.rm = TRUE)
)) %>%
pivot_longer(cols = everything(), names_to = c("variable", ".value"), names_sep = "_")
summary_stats_long <- summary_stats %>%
pivot_longer(cols = -variable, names_to = "statistic", values_to = "value")
ggplot(summary_stats, aes(x = variable)) +
geom_errorbar(aes(ymin = min, ymax = max, color = "Range (Min to Max)"), width = 0.2) +
geom_point(aes(y = mean, color = "Mean"), size = 3) +
geom_point(aes(y = mean + sd, color = "Mean + SD"), shape = 4, size = 3) +
geom_point(aes(y = mean - sd, color = "Mean - SD"), shape = 4, size = 3) +
scale_color_manual(
name = "Statistike",
values = c("Range (Min to Max)" = "black", "Mean" = "blue", "Mean + SD" = "red", "Mean - SD" = "red")
) +
labs(
title = "Grafikoni za deskriptivnu statistiku za svaki parametar",
x = "Varijabla",
y = "Vrednost"
) +
theme_minimal()
data$category <- ifelse(substr(data[[1]], 1, 2) == "ST", "Strazilovo",
ifelse(substr(data[[1]], 1, 1) == "D", "Dumbovo",
ifelse(substr(data[[1]], 1, 1) == "V", "Vrdnik",
ifelse(substr(data[[1]], 1, 2) == "LV", "Lazin Vir",
ifelse(substr(data[[1]], 1, 1) == "J", "Jazak",
ifelse(substr(data[[1]], 1, 2) == "KB", "Kanov Breg", "Papratski Do")
)
)
)
)
)
data_numeric <- data %>% select_if(is.numeric) %>% cbind(category = data$category)
data_long <- data_numeric %>% pivot_longer(cols = -category, names_to = "variable", values_to = "value")
ggplot(data_long, aes(x = category, y = value)) +
geom_boxplot() +
facet_wrap(~ variable, scales = "free_y") +
theme(axis.text.x = element_text(angle = 90, hjust = 1),
panel.spacing = unit(2, "lines")) +
labs(title = "Box plot svaku lokaciju", x = "Lokacija", y = "Vrednost")
Generalizovani linearni model (GLM) je opšti okvir u statistici koji se koristi za modelovanje odnosa između zavisne promenljive (ishoda) i jedne ili više nezavisnih promenljivih (prediktora). GLM proširuje klasični linearni model (LM) tako da može da se bavi situacijama gde su pretpostavke o normalnoj raspodeli greške i linearnosti odnosa neadekvatne. GLM se sastoji od tri glavna dela:
1. Linearni prediktor: Kombinacija nezavisnih promenljivih i njihovih koeficijenata. U matematičkom obliku, to je: η=Xβ gde je η linearni prediktor, X je matrica nezavisnih promenljivih, a β je vektor koeficijenata.
2. Funkcija veze (link funkcija): Funkcija koja povezuje očekivanu vrednost zavisne promenljive sa linearnim prediktorom. Neke često korišćene link funkcije uključuju:
3. Porodična funkcija raspodele: Vrsta raspodele koja opisuje zavisnu promenljivu. GLM može koristiti različite porodice raspodela, kao što su:
Normalna raspodela (Koristi se za kontinuirane zavisne promenljive sa konstantnom varijancom.).
Binomna raspodela (za binarne podatke).
Poisson-ova raspodela (za brojanja ili retke događaje).
Gamma raspodela (za pozitivne kontinuirane podatke, naročito kada je varijanca proporcionalna kvadratu srednje vrednosti).
U većini statističkih softvera i programskih jezika kao što su R, Python i SAS, postoje ugrađene funkcije za fitovanje GLM-a. Na primer, u R-u se koristi funkcija \(glm()\) iz osnovne statističke biblioteke čiji izlaz uključuje procene koeficijenata modela, standardnih grešaka, p-vrednosti i statistike dobrog uklapanja.
GLM model je korišćen nad podacima iz tabele za utvrđivanje uticaja svake od datih promenljivih na Shannon-ov indeks diverziteta flore mahovina na lokalitetima koji su prethodno navedeni.
Iz rezultata koje smo dobili možemo zaključiti da na indeks diverziteta utiče vlažnost zemljišta, pH površinskog sloja zemljišta i pokrovnost zeljastih vaskularnih biljaka. Shannon-ov indeks diverziteta se povećava sa povećanjem pH vrednosti zemljišta, sa povećanjem vlažnosti zemljišta i žbunastih biljaka.
data <- read_excel("EnvV.xlsx")
ggplot(data, aes(x = Shannon, y = tn)) +
geom_point(color = "red", size = 3, alpha = 0.7) +
geom_smooth(method = "glm", method.args = list(family = gaussian), color = "blue", linetype = "solid", size = 1.5) +
labs(x = "Shannon", y = "Broj drvenastih biljaka") +
ggtitle("Generalized Linear Model") +
theme_classic(base_size = 15) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 18),
axis.title.x = element_text(face = "italic", size = 14),
axis.title.y = element_text(face = "italic", size = 14),
axis.text = element_text(size = 12),
axis.line = element_line(color = "grey15", size = 1),
panel.grid.major = element_line(color = "gray78", size = 0.5),
panel.grid.minor = element_blank()
)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'
data <- read_excel("EnvV.xlsx")
ggplot(data, aes(x = Shannon, y = pH)) +
geom_point(color = "red", size = 3, alpha = 0.7) +
geom_smooth(method = "glm", method.args = list(family = gaussian), color = "blue", linetype = "solid", size = 1.5) +
labs(x = "Shannon", y = "pH vrednost zemljišta") +
ggtitle("Generalized Linear Model") +
theme_classic(base_size = 15) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 18),
axis.title.x = element_text(face = "italic", size = 14),
axis.title.y = element_text(face = "italic", size = 14),
axis.text = element_text(size = 12),
axis.line = element_line(color = "grey15", size = 1),
panel.grid.major = element_line(color = "gray78", size = 0.5),
panel.grid.minor = element_blank()
)
## `geom_smooth()` using formula = 'y ~ x'
data <- read_excel("EnvV.xlsx")
ggplot(data, aes(x = Shannon, y = sm)) +
geom_point(color = "red", size = 3, alpha = 0.7) +
geom_smooth(method = "glm", method.args = list(family = gaussian), color = "blue", linetype = "solid", size = 1.5) +
labs(x = "Shannon", y = "vlažnost zemljišta") +
ggtitle("Generalized Linear Model") +
theme_classic(base_size = 15) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 18),
axis.title.x = element_text(face = "italic", size = 14),
axis.title.y = element_text(face = "italic", size = 14),
axis.text = element_text(size = 12),
axis.line = element_line(color = "grey15", size = 1),
panel.grid.major = element_line(color = "gray78", size = 0.5),
panel.grid.minor = element_blank()
)
## `geom_smooth()` using formula = 'y ~ x'
data <- read_excel("EnvV.xlsx")
ggplot(data, aes(x = Shannon, y = st)) +
geom_point(color = "red", size = 3, alpha = 0.7) +
geom_smooth(method = "glm", method.args = list(family = gaussian), color = "blue", linetype = "solid", size = 1.5) +
labs(x = "Shannon", y = "temperatura zemljišta") +
ggtitle("Generalized Linear Model") +
theme_classic(base_size = 15) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 18),
axis.title.x = element_text(face = "italic", size = 14),
axis.title.y = element_text(face = "italic", size = 14),
axis.text = element_text(size = 12),
axis.line = element_line(color = "grey15", size = 1),
panel.grid.major = element_line(color = "gray78", size = 0.5),
panel.grid.minor = element_blank()
)
## `geom_smooth()` using formula = 'y ~ x'
data <- read_excel("EnvV.xlsx")
ggplot(data, aes(x = Shannon, y = hc)) +
geom_point(color = "red", size = 3, alpha = 0.7) +
geom_smooth(method = "glm", method.args = list(family = gaussian), color = "blue", linetype = "solid", size = 1.5) +
labs(x = "Shannon", y = "pokrovnost zeljastih biljaka") +
ggtitle("Generalized Linear Model") +
theme_classic(base_size = 15) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 18),
axis.title.x = element_text(face = "italic", size = 14),
axis.title.y = element_text(face = "italic", size = 14),
axis.text = element_text(size = 12),
axis.line = element_line(color = "grey15", size = 1),
panel.grid.major = element_line(color = "gray78", size = 0.5),
panel.grid.minor = element_blank()
)
## `geom_smooth()` using formula = 'y ~ x'
data <- read_excel("EnvV.xlsx")
ggplot(data, aes(x = Shannon, y = lc)) +
geom_point(color = "red", size = 3, alpha = 0.7) +
geom_smooth(method = "glm", method.args = list(family = gaussian), color = "blue", linetype = "solid", size = 1.5) +
labs(x = "Shannon", y = "pokrovnost stelje") +
ggtitle("Generalized Linear Model") +
theme_classic(base_size = 15) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 18),
axis.title.x = element_text(face = "italic", size = 14),
axis.title.y = element_text(face = "italic", size = 14),
axis.text = element_text(size = 12),
axis.line = element_line(color = "grey15", size = 1),
panel.grid.major = element_line(color = "gray78", size = 0.5),
panel.grid.minor = element_blank()
)
## `geom_smooth()` using formula = 'y ~ x'
data <- read_excel("EnvV.xlsx")
ggplot(data, aes(x = Shannon, y = sd)) +
geom_point(color = "red", size = 3, alpha = 0.7) +
geom_smooth(method = "glm", method.args = list(family = gaussian), color = "blue", linetype = "solid", size = 1.5) +
labs(x = "Shannon", y = "udaljenost od potoka") +
ggtitle("Generalized Linear Model") +
theme_classic(base_size = 15) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 18),
axis.title.x = element_text(face = "italic", size = 14),
axis.title.y = element_text(face = "italic", size = 14),
axis.text = element_text(size = 12),
axis.line = element_line(color = "grey15", size = 1),
panel.grid.major = element_line(color = "gray78", size = 0.5),
panel.grid.minor = element_blank()
)
## `geom_smooth()` using formula = 'y ~ x'
data <- read_excel("EnvV.xlsx")
ggplot(data, aes(x = Shannon, y = bn)) +
geom_point(color = "red", size = 3, alpha = 0.7) +
geom_smooth(method = "glm", method.args = list(family = gaussian), color = "blue", linetype = "solid", size = 1.5) +
labs(x = "Shannon", y = "broj žbunastih biljaka") +
ggtitle("Generalized Linear Model") +
theme_classic(base_size = 15) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 18),
axis.title.x = element_text(face = "italic", size = 14),
axis.title.y = element_text(face = "italic", size = 14),
axis.text = element_text(size = 12),
axis.line = element_line(color = "grey15", size = 1),
panel.grid.major = element_line(color = "gray78", size = 0.5),
panel.grid.minor = element_blank()
)
## `geom_smooth()` using formula = 'y ~ x'
data <- read_excel("EnvV.xlsx")
ggplot(data, aes(x = Shannon, y = ft)) +
geom_point(color = "red", size = 3, alpha = 0.7) +
geom_smooth(method = "glm", method.args = list(family = gaussian), color = "blue", linetype = "solid", size = 1.5) +
labs(x = "Shannon", y = "tip šume") +
ggtitle("Generalized Linear Model") +
theme_classic(base_size = 15) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 18),
axis.title.x = element_text(face = "italic", size = 14),
axis.title.y = element_text(face = "italic", size = 14),
axis.text = element_text(size = 12),
axis.line = element_line(color = "grey15", size = 1),
panel.grid.major = element_line(color = "gray78", size = 0.5),
panel.grid.minor = element_blank()
)
## `geom_smooth()` using formula = 'y ~ x'
Kanonska korelaciona analiza (CCA) je statistička metoda koja se koristi za istraživanje veza između dva skupa promenljivih. Cilj CCA je da identifikuje i kvantifikuje parove kanonskih varijabli (linearnih kombinacija originalnih promenljivih iz oba skupa) koje imaju najveći mogući korelacioni koeficijent.
# Učitavamo paket za rad sa Excel fajlovima
library(readxl)
# Učitavamo podatke iz Excel fajlova
env <- read_excel("Env.xlsx")
## New names:
## • `` -> `...1`
spec <- read_excel("Spec.xlsx")
# Menjamo ime kolone 'A' u 'site'
colnames(env)[1] <- "site"
head(env)
## # A tibble: 6 × 10
## site pH sm st hc lc sd tn bn ft
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ST1 5.1 31.2 18 2 4 1 8 4 1
## 2 ST2 4.8 26.3 19 3 7 4 12 1 1
## 3 ST3 4.9 25.5 19 7 8 7 14 5 1
## 4 ST4 5 23.4 18 4 9 9 16 3 1
## 5 ST5 4.9 21.4 18 4 9 10 15 2 1
## 6 PD1 4.5 20.8 17 15 8 0 9 4 1
head(spec)
## # A tibble: 6 × 55
## Abi_abi Amb_ser Ano_att Atr_ang Atr_und Baz_tri Bra_gla Bra_riv Bra_rut
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 4 0 2 1 9 0 0 0 10
## 2 3 0 3 1 7 0 0 0 7
## 3 2 0 3 1 3 0 0 0 9
## 4 0 0 2 1 1 0 0 0 4
## 5 0 0 1 1 1 0 0 0 4
## 6 8 5 0 0 1 0 0 0 14
## # ℹ 46 more variables: Bra_vel <dbl>, Cal_lin <dbl>, Cep_stel <dbl>,
## # Cer_pur <dbl>, Chi_pal <dbl>, Cra_fil <dbl>, Dic_het <dbl>, Dic_sco <dbl>,
## # Did_spa <dbl>, `Eur _pul` <dbl>, Eur_str <dbl>, Fis_dub <dbl>,
## # Fis_tax <dbl>, Fun_hyg <dbl>, Hyg_ten <dbl>, Hyg_var <dbl>, Hyp_cal <dbl>,
## # Hyp_cup <dbl>, Hyp_rev <dbl>, Jun_atr <dbl>, Kin_pra <dbl>, Les_mut <dbl>,
## # Met_con <dbl>, Mic_pum <dbl>, Mni_mar <dbl>, Oxy_hia <dbl>, Oxy_sch <dbl>,
## # Pla_aff <dbl>, Pla_cav <dbl>, Pla_cur <dbl>, Pla_den <dbl>, …
site_labels <- env$site # Čuvamo nazive lokacija u posebnoj varijabli kako bismo mogli lako da im pristupimo kasnije
env_numeric <- env[, -1] # Kreiramo novi data frame-a bez kolone 'site' koji ćemo koristiti u analizi ---> varijable koje ulaze u CCA moraju biti numeričke
# Učitavamo vegan paket
library(vegan)
## Loading required package: permute
##
## Attaching package: 'permute'
## The following object is masked from 'package:igraph':
##
## permute
## Loading required package: lattice
## This is vegan 2.6-6
##
## Attaching package: 'vegan'
## The following object is masked from 'package:igraph':
##
## diversity
# Izvršavanje CCA
cca_model <- cca(spec ~ ., data = env_numeric)
# Prikaz rezultata
summary(cca_model)
##
## Call:
## cca(formula = spec ~ pH + sm + st + hc + lc + sd + tn + bn + ft, data = env_numeric)
##
## Partitioning of scaled Chi-square:
## Inertia Proportion
## Total 3.857 1.0000
## Constrained 2.419 0.6273
## Unconstrained 1.437 0.3727
##
## Eigenvalues, and their contribution to the scaled Chi-square
##
## Importance of components:
## CCA1 CCA2 CCA3 CCA4 CCA5 CCA6 CCA7
## Eigenvalue 0.9092 0.4577 0.30395 0.26126 0.22249 0.10044 0.08125
## Proportion Explained 0.2357 0.1187 0.07881 0.06774 0.05769 0.02604 0.02107
## Cumulative Proportion 0.2357 0.3544 0.43323 0.50097 0.55865 0.58469 0.60576
## CCA8 CCA9 CA1 CA2 CA3 CA4 CA5
## Eigenvalue 0.05702 0.026068 0.24621 0.2067 0.1913 0.16911 0.12899
## Proportion Explained 0.01478 0.006759 0.06384 0.0536 0.0496 0.04384 0.03344
## Cumulative Proportion 0.62054 0.627301 0.69114 0.7447 0.7943 0.83818 0.87163
## CA6 CA7 CA8 CA9 CA10 CA11 CA12
## Eigenvalue 0.10765 0.07616 0.05539 0.04788 0.04429 0.04066 0.027519
## Proportion Explained 0.02791 0.01975 0.01436 0.01241 0.01148 0.01054 0.007135
## Cumulative Proportion 0.89954 0.91928 0.93364 0.94606 0.95754 0.96808 0.975218
## CA13 CA14 CA15 CA16 CA17 CA18
## Eigenvalue 0.022709 0.018877 0.016448 0.013703 0.010339 0.006324
## Proportion Explained 0.005888 0.004894 0.004265 0.003553 0.002681 0.001640
## Cumulative Proportion 0.981106 0.986000 0.990264 0.993817 0.996498 0.998138
## CA19 CA20 CA21 CA22 CA23
## Eigenvalue 0.0033512 0.0024168 0.0010535 3.317e-04 3.001e-05
## Proportion Explained 0.0008689 0.0006266 0.0002731 8.601e-05 7.780e-06
## Cumulative Proportion 0.9990065 0.9996331 0.9999062 1.000e+00 1.000e+00
##
## Accumulated constrained eigenvalues
## Importance of components:
## CCA1 CCA2 CCA3 CCA4 CCA5 CCA6 CCA7
## Eigenvalue 0.9092 0.4577 0.3040 0.2613 0.22249 0.10044 0.08125
## Proportion Explained 0.3758 0.1892 0.1256 0.1080 0.09196 0.04152 0.03358
## Cumulative Proportion 0.3758 0.5650 0.6906 0.7986 0.89056 0.93208 0.96566
## CCA8 CCA9
## Eigenvalue 0.05702 0.02607
## Proportion Explained 0.02357 0.01077
## Cumulative Proportion 0.98923 1.00000
library(ggplot2)
eigenvalues <- c(0.9092, 0.4577, 0.30395, 0.26126, 0.22249, 0.10044, 0.08125, 0.05702, 0.026068)
eigen_df <- data.frame(Axis = 1:9, Eigenvalue = eigenvalues)
ggplot(eigen_df, aes(x = Axis, y = Eigenvalue)) +
geom_line() + geom_point() + scale_x_continuous(breaks = 1:9) +
labs(title = "Grafik sopstvenih vrednosti", x = "Kanoničke ose", y = "Eigenvalue")
# Originalne proporcije
proportions <- c(0.2357, 0.1187, 0.07881, 0.06774, 0.05769, 0.02604, 0.02107, 0.01478, 0.006759)
# Proširivanje proporcija za ose koje nedostaju
full_proportions <- c(proportions, rep(0, 9 - length(proportions)))
# Kreiranje dataframe-a
prop_df <- data.frame(Axis = 1:9, Proportion = full_proportions)
# Crtanje grafika
ggplot(prop_df, aes(x = Axis, y = Proportion)) +
geom_bar(stat = "identity", fill = "blue") +
labs(title = "Proporcija objašnjene varijanse po osama", x = "Kanoničke ose", y = "Proporcija objašnjene varijanse") + scale_x_continuous(breaks = 1:9, labels = as.character(1:9))
# Prikaz težinskih koeficijenata za CCA1
coef(cca_model, choices = 1)
## CCA1 CCA2 CCA3 CCA4 CCA5 CCA6
## pH -0.021568127 -0.717810015 -0.050808914 -1.56194558 1.36991871 0.40079686
## sm -0.005229556 0.235009940 -0.149858384 0.11764266 -0.19461944 -0.14253317
## st 0.037159113 -0.250798695 0.222559202 -0.32820991 0.44247796 0.06738434
## hc 0.001871809 0.001934809 0.101969002 -0.07204793 -0.12267481 0.09031035
## lc -0.012522313 0.057655926 -0.042903293 0.06724937 -0.08979903 -0.29597515
## sd 0.005175843 -0.170810305 -0.322025786 0.06452038 -0.10792012 0.06011225
## tn 0.002115552 0.044923921 0.009216437 -0.07163860 -0.23190221 0.06169296
## bn -0.010589363 -0.020790787 -0.147600685 0.01930701 -0.05707623 0.29604557
## ft -2.095373600 0.298291821 -2.812723911 -0.23003484 2.29959534 4.79366663
## CCA7 CCA8 CCA9
## pH 1.06104014 1.94733455 -2.417258650
## sm -0.22697510 -0.19208667 0.139503343
## st -0.49736059 -0.74146586 -0.424322984
## hc -0.23364409 0.03101909 0.024706396
## lc 0.06202363 0.03673342 -0.129800726
## sd -0.12016775 0.05965681 0.006502796
## tn -0.01311249 -0.31592620 0.190254431
## bn 0.14145349 -0.06469528 -0.512956352
## ft -3.88915075 0.99948502 -1.136950206
# Izvlačenje koeficijenata za prvu kanonsku osu i pravljenje data frame-a
cca_coef <- coef(cca_model, choices = "CCA1") # Pobrinite se da izvlačite pravu osu
coef_df <- as.data.frame(cca_coef[, 1, drop = FALSE]) # Pretvara koeficijente u data frame
names(coef_df) <- "Coef" # Imenuje kolonu
coef_df$Variables <- rownames(coef_df) # Dodaje varijable kao kolonu
# Kreiranje bar grafikona koristeći ggplot2
p <- ggplot(coef_df, aes(x = Variables, y = Coef)) +
geom_bar(stat = "identity", fill = "steelblue") +
theme_minimal() +
labs(title = "Koeficijenti za CCA1", x = "Ekološki Faktori", y = "Vrednost Koeficijenta") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotiranje labela za bolju čitljivost
# Prikaz grafikona
print(p)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:igraph':
##
## groups
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(vegan)
library(stringr)
# Dobijanje skorova za lokacije i biplot skorova za ekološke faktore
site_scores <- scores(cca_model, display = "sites")
species_scores <- scores(cca_model, display = "species")
env_factors <- scores(cca_model, display = "bp", scaling = 3) # Ekološki faktori, povećano skaliranje
# Kreiranje grupacija lokacija
grouped_site_labels <- str_extract(site_labels, "[A-Za-z]+")
group_colors <- c("ST" = "red", "PD" = "green", "LV" = "yellow", "KB" = "pink", "J" = "blueviolet", "V" = "cyan", "D" = "orange" )
colors <- group_colors[grouped_site_labels]
# colors <- RColorBrewer::brewer.pal(max(length(unique(grouped_site_labels))), "Set3") # Generiše palete boja
# Kreiranje Plotly grafika
p <- plot_ly() %>%
add_markers(
x = site_scores[,1], y = site_scores[,2],
color = as.factor(grouped_site_labels), colors = colors,
text = grouped_site_labels, # Tekst za prelazak mišem
marker = list(size = 10),
name = ~grouped_site_labels # Dinamičko dodeljivanje imena za legendu
)
# Dodavanje vektora za svaki ekološki faktor
for (i in 1:nrow(env_factors)) {
factor_name <- rownames(env_factors)[i]
p <- p %>%
add_segments(
x = 0, xend = env_factors[i, 1] * 2, # Dužina linije
y = 0, yend = env_factors[i, 2] * 2,
line = list(color = 'blue', width = 2),
name = factor_name, showlegend = FALSE
) %>%
add_annotations(
x = env_factors[i, 1] * 2, y = env_factors[i, 2] * 2,
text = factor_name, showarrow = FALSE, xanchor = 'center', yanchor = 'bottom',
showlegend = FALSE
)
}
# Podešavanje layout-a
p <- p %>%
layout(
title = 'Grupisani CCA Biplot sa Ekološkim Faktorima',
xaxis = list(title = 'CCA1'),
yaxis = list(title = 'CCA2'),
hovermode = 'closest',
legend = list(title = "Grupacije Lokacija", x = 1.05, y = 1)
)
# Prikazivanje grafika
p
## Warning: Duplicate levels detected
## Duplicate levels detected
# Kreiranje Plotly grafika za vrste sa labelama
p_species <- plot_ly() %>%
add_markers(
x = species_scores[,1], y = species_scores[,2],
text = rownames(species_scores), # Tekst za prelazak mišem (imena vrsta)
marker = list(size = 10, color = 'red'), # Boja tačaka za vrste
name = "Species" # Ime za legendu
) %>%
add_text(
x = species_scores[,1], y = species_scores[,2],
text = rownames(species_scores),
textposition = 'top right',
showlegend = FALSE
)
# Dodavanje vektora za svaki ekološki faktor (kao što je urađeno za lokacije)
for (i in 1:nrow(env_factors)) {
factor_name <- rownames(env_factors)[i]
p_species <- p_species %>%
add_segments(
x = 0, xend = env_factors[i, 1] * 2, # Dužina linije
y = 0, yend = env_factors[i, 2] * 2,
line = list(color = 'blue', width = 2),
name = factor_name, showlegend = FALSE
) %>%
add_annotations(
x = env_factors[i, 1] * 2, y = env_factors[i, 2] * 2,
text = factor_name, showarrow = FALSE, xanchor = 'center', yanchor = 'bottom',
showlegend = FALSE
)
}
# Podešavanje layout-a za graf sa vrstama
p_species <- p_species %>%
layout(
title = 'Grupisani CCA Biplot sa Ekološkim Faktorima i Vrstama',
xaxis = list(title = 'CCA1'),
yaxis = list(title = 'CCA2'),
hovermode = 'closest',
legend = list(title = "Vrste", x = 1.05, y = 1)
)
# Prikazivanje grafika
p_species
Label Propagation
je algoritam iz oblasti mašinskog učenja i teorije grafova koji se koristi za rešavanje problema klasifikacije. Lp- algoritam je tip delimično-nadglednih algoritama jer koristi i označene (labelisane) i neoznačene podatke (nelabelisane) za treniranje modela.
Osnove algoritma:
Predstavljanje podataka kao graf - gde čvorovi predstavljaju podatke, a grane predstavljaju sličnosti među podacima
Inicijalizacija labela - Na početku čvorovi koji imaju poznate labele zadržavaju te labele, a oni koji nemaju su ili neoznačeni ili označeni sa nulom
Propagacija algoritma - Algoritam iterativno aržurira labele čvorova . Tokom svake iteracije svaki čvor aržurira svoj label na odnosu labela svojih suseda, često koristimo neki oblik glasanja tj. To znači da će čvor uzeti najučestaliji laben medju svojim susedima uzimajući u obzir težine grana
Algoritam se zaustavlja kada se labele stabilizuju tj. prestanu menjati izmedju iteracija ili nakon dostignutog unapred zadatog broja iteracija
#ucitavamo podatke
nodes<-read_excel("nodes.xlsx")
edges <- read_excel("edges.xlsx")
#pravimo graf
g <- graph_from_data_frame(d = edges, vertices = nodes, directed = FALSE)
set.seed(21)
# Set node attributes (assuming 'nodes' has columns 'name' and 'abudance')
V(g)$abudance <-nodes$Abudance
min_size <- 0.5
max_size <- 1
V(g)$size <- scales::rescale(V(g)$abudance, to = c(min_size, max_size))
# Perform label propagation
communities <- label.propagation.community(g,weights = E(g)$weight)
## Warning: `label.propagation.community()` was deprecated in igraph 2.0.0.
## ℹ Please use `cluster_label_prop()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
cat("Broj detektovanih zajednica", length(communities))
## Broj detektovanih zajednica 9
sizes(communities)
## Community sizes
## 1 2 3 4 5 6 7 8 9
## 4 4 3 22 10 2 5 2 2
plot(g, vertex.color = communities$membership, vertex.size = V(g)$abudance, main = "Label Propagation")